A bi-lingual Mandarin/taiwanese (min-nan), large vocabulary, continuous speech recognition system based on the tong-yong phonetic alphabet (TYPA)

نویسندگان

Ren-Yuan Lyu

Chi-yu Chen

Yuang-Chin Chiang

Min-shung Liang

چکیده

In this paper, we describe the first Mandarin/Taiwanese (Min-nan) bi-lingual, continuous speech recognition system for large vocabulary or vocabulary-independent applications. A phonetic transcription system called Tong-yong Phonetic Alphabet (TYPA) is described and used to transcribe the bilingual Mandarin/Taiwanese lexicons. The Right-ContextDependent (RCD) phonetic continuous-density Hidden Markov Models (CHMM) based on TYPA are used as the acoustic models. A lexicon tree containing 40 thousand bilingual words is used as a searching net to evaluate the performance of the recognizer. A 92.55% word accuracy is achieved on a speaker dependent case. Furthermore, we construct a continuous-speech real-time demonstration system based on the vocabulary-independent RCD models for a specific application domain of automated hospital appointment arrangement, where Mandarin/Taiwanese mixed speech is very possible to happen.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker Independent Acoustic Modeling for Large Vocabulary Bi-lingual Taiwanese/mandarin Continuous Speech Recognition

In this paper, we describe the acoustic modelling technique for a bi-lingual Taiwanese /Mandarin speech recognition system, which deals with speaker independent continuous speech based on HMMs clustered by an acoustic phonetic decision tree. A bi-lingual recogniser with a bilingual database of 120 people was built. The vocabulary size of this system is up to 40 thousands. Unigram, bi-gram, and ...

متن کامل

Large vocabulary taiwanese (min-nan) speech recognition using tone features and statistical pronunciation modeling

A large vocabulary Taiwanese (Min-nan) speech recognition system is described in this paper. Due to the severe multiple pronunciation phenomenon in Taiwanese partly caused by tone sandhi, a statistical pronunciation modeling technique based on tonal features is used. This system is speaker independent. It was trained by a bi-lingual Mandarin/Taiwanese speech corpus to alleviate the lack of pure...

متن کامل

Modeling Pronunciation Variation for Bi-Lingual Mandarin/Taiwanese Speech Recognition

In this paper, a bi-lingual large vocaburary speech recognition experiment based on the idea of modeling pronunciation variations is described. The two languages under study are Mandarin Chinese and Taiwanese (Min-nan). These two languages are basically mutually unintelligible, and they have many words with the same Chinese characters and the same meanings, although they are pronounced differen...

متن کامل

Construct a multi-lingual speech corpus in taiwan with extracting phonetically balanced articles

In this paper, we describe an initial stage to construct a multilingual speech corpus in Taiwan with selecting phonetically balanced scripts. It is expected to collect a multilingual speech corpus covering three most frequently used languages in Taiwan, including Taiwanese (Min-nan), Hakka, and Mandarin Chinese. To achieve the objective, constructing a multilingual phonetic alphabet, namely For...

متن کامل

A Taiwanese (min-nan) text-to-speech (TTS) system based on automatically generated synthetic units

A Taiwanese (Min-nan) Text-to-Speech (TTS) system has been constructed in this paper based on automatically generated synthetic units by considering several specific phonetic and linguistic characteristics of Taiwanese. Some basic facts about Taiwanese useful in a TTS system is summarized, including the issues of tone sandhi, the writen format and the others. Three functional modules, namely a ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

A bi-lingual Mandarin/taiwanese (min-nan), large vocabulary, continuous speech recognition system based on the tong-yong phonetic alphabet (TYPA)

نویسندگان

چکیده

منابع مشابه

Speaker Independent Acoustic Modeling for Large Vocabulary Bi-lingual Taiwanese/mandarin Continuous Speech Recognition

Large vocabulary taiwanese (min-nan) speech recognition using tone features and statistical pronunciation modeling

Modeling Pronunciation Variation for Bi-Lingual Mandarin/Taiwanese Speech Recognition

Construct a multi-lingual speech corpus in taiwan with extracting phonetically balanced articles

A Taiwanese (min-nan) text-to-speech (TTS) system based on automatically generated synthetic units

عنوان ژورنال:

اشتراک گذاری